Another common transformation of time-series is to apply a function over a fixed rolling window of data.
Note that rolling functions different conceptually from aggregates as they are not calculated over disjoint subsets of the data: the output is at the same time period as the original data.
Moving Averages
A common rolling function is the moving average: we calculate the average value of the time series over a fixed window of data.
ap_rollmean_sixmonth_tbl <- airpassengers_tbl %>%
tq_mutate(
# tq_mutate args
select = value
,mutate_fun = rollapply
# rollapply args
,width = 6
,align = "right"
,FUN = mean
# mean args
,na.rm = TRUE
# tq_mutate args
,col_rename = "mean_6m"
)
ap_rollmean_sixmonth_tbl %>% glimpse()
## Observations: 144
## Variables: 3
## $ month <S3: yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, J...
## $ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118,...
## $ mean_6m <dbl> NA, NA, NA, NA, NA, 124.500, 130.500, 135.500, 136.167, 134...
ap_rollmean_sixmonth_tbl %>% print()
## # A tibble: 144 x 3
## month value mean_6m
## <S3: yearmon> <dbl> <dbl>
## 1 Jan 1949 112 NA
## 2 Feb 1949 118 NA
## 3 Mar 1949 132 NA
## 4 Apr 1949 129 NA
## 5 May 1949 121 NA
## 6 Jun 1949 135 124.
## 7 Jul 1949 148 130.
## 8 Aug 1949 148 136.
## 9 Sep 1949 136 136.
## 10 Oct 1949 119 134.
## # ... with 134 more rows
We compare the two values by plotting the original time series against its moving average.
plot_tbl <- ap_rollmean_sixmonth_tbl %>%
rename(orig = value) %>%
gather('label', 'value', -month)
ggplot(plot_tbl) +
geom_line(aes(x = month, y = value, colour = label)) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Passenger Total') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Comparison Plot of the Air Passenger Counts')

Note that the moving-average series does not start at the same timestamp as the original dataset size is reduced by the windowing function.
We can add multiple moving averages to a time series by chaining a series of tq_mutate() calls together.
ap_rollmean_multi_tbl <- airpassengers_tbl %>%
tq_mutate(
# tq_mutate args
select = value
,mutate_fun = rollapply
# rollapply args
,width = 6
,align = "right"
,FUN = mean
# mean args
,na.rm = TRUE
# tq_mutate args
,col_rename = "mean_6m"
) %>%
tq_mutate(
# tq_mutate args
select = value
,mutate_fun = rollapply
# rollapply args
,width = 12
,align = "right"
,FUN = mean
# mean args
,na.rm = TRUE
# tq_mutate args
,col_rename = "mean_12m"
)
ap_rollmean_multi_tbl %>% glimpse()
## Observations: 144
## Variables: 4
## $ month <S3: yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, ...
## $ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118...
## $ mean_6m <dbl> NA, NA, NA, NA, NA, 124.500, 130.500, 135.500, 136.167, 13...
## $ mean_12m <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 126.667, 126.9...
ap_rollmean_multi_tbl %>% print()
## # A tibble: 144 x 4
## month value mean_6m mean_12m
## <S3: yearmon> <dbl> <dbl> <dbl>
## 1 Jan 1949 112 NA NA
## 2 Feb 1949 118 NA NA
## 3 Mar 1949 132 NA NA
## 4 Apr 1949 129 NA NA
## 5 May 1949 121 NA NA
## 6 Jun 1949 135 124. NA
## 7 Jul 1949 148 130. NA
## 8 Aug 1949 148 136. NA
## 9 Sep 1949 136 136. NA
## 10 Oct 1949 119 134. NA
## # ... with 134 more rows
As before, we now create a lineplot of the three values to show the effect of the different window sizes.
plot_tbl <- ap_rollmean_multi_tbl %>%
rename(orig = value) %>%
gather('label', 'value', -month)
ggplot(plot_tbl) +
geom_line(aes(x = month, y = value, colour = label)) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Passenger Total') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Comparison Plot of the Air Passenger Counts')

The twelve month time series is shorter than the six month series as it has a wider calculation window.
Any sort of other windowing functions can be applied, including the standard deviation, allowing us to include a range of possible values.
ribbon_func <- function(x, na.rm = TRUE, ...) {
mu <- mean(x, na.rm = na.rm)
sigma <- sd(x, na.rm = na.rm)
lower <- mu - 2 * sigma
upper <- mu + 2 * sigma
return(c(mu = mu, l2sd = lower, u2sd = upper))
}
ap_roll_ribbon_tbl <- airpassengers_tbl %>%
tq_mutate(
# tq_mutate args
select = value
,mutate_fun = rollapply
# rollapply args
,width = 6
,align = "right"
,by.column = FALSE
,FUN = ribbon_func
# mean args
,na.rm = TRUE
)
ap_roll_ribbon_tbl %>% glimpse()
## Observations: 144
## Variables: 5
## $ month <S3: yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, Jun...
## $ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 1...
## $ mu <dbl> NA, NA, NA, NA, NA, 124.500, 130.500, 135.500, 136.167, 134.5...
## $ l2sd <dbl> NA, NA, NA, NA, NA, 106.6674, 109.0058, 114.0058, 114.9472, 1...
## $ u2sd <dbl> NA, NA, NA, NA, NA, 142.333, 151.994, 156.994, 157.386, 159.6...
ap_roll_ribbon_tbl %>% print()
## # A tibble: 144 x 5
## month value mu l2sd u2sd
## <S3: yearmon> <dbl> <dbl> <dbl> <dbl>
## 1 Jan 1949 112 NA NA NA
## 2 Feb 1949 118 NA NA NA
## 3 Mar 1949 132 NA NA NA
## 4 Apr 1949 129 NA NA NA
## 5 May 1949 121 NA NA NA
## 6 Jun 1949 135 124. 107. 142.
## 7 Jul 1949 148 130. 109. 152.
## 8 Aug 1949 148 136. 114. 157.
## 9 Sep 1949 136 136. 115. 157.
## 10 Oct 1949 119 134. 109. 160.
## # ... with 134 more rows
We now plot the original data against the moving average and the mean.
ggplot(ap_roll_ribbon_tbl) +
geom_line(aes(x = month, y = value)) +
geom_line(aes(x = month, y = mu), colour = 'red') +
geom_ribbon(aes(x = month, ymin = l2sd, ymax = u2sd)
,colour = 'grey', alpha = 0.25) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Passenger Total') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Ribbon Plot of the Air Passenger Counts (6 month window)')

We now repeat this process with using a twelve-month window for the data.
ap_roll_12m_ribbon_tbl <- airpassengers_tbl %>%
tq_mutate(
# tq_mutate args
select = value
,mutate_fun = rollapply
# rollapply args
,width = 12
,align = "right"
,by.column = FALSE
,FUN = ribbon_func
# mean args
,na.rm = TRUE
)
ap_roll_12m_ribbon_tbl %>% glimpse()
## Observations: 144
## Variables: 5
## $ month <S3: yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, Jun...
## $ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 1...
## $ mu <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 126.667, 126.917,...
## $ l2sd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 99.2264, 100.0100...
## $ u2sd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 154.107, 153.823,...
ap_roll_12m_ribbon_tbl %>% print()
## # A tibble: 144 x 5
## month value mu l2sd u2sd
## <S3: yearmon> <dbl> <dbl> <dbl> <dbl>
## 1 Jan 1949 112 NA NA NA
## 2 Feb 1949 118 NA NA NA
## 3 Mar 1949 132 NA NA NA
## 4 Apr 1949 129 NA NA NA
## 5 May 1949 121 NA NA NA
## 6 Jun 1949 135 NA NA NA
## 7 Jul 1949 148 NA NA NA
## 8 Aug 1949 148 NA NA NA
## 9 Sep 1949 136 NA NA NA
## 10 Oct 1949 119 NA NA NA
## # ... with 134 more rows
Having constructed the data, we once again create a ribbon plot with these quantities.
ggplot(ap_roll_12m_ribbon_tbl) +
geom_line(aes(x = month, y = value)) +
geom_line(aes(x = month, y = mu), colour = 'red') +
geom_ribbon(aes(x = month, ymin = l2sd, ymax = u2sd)
,colour = 'grey', alpha = 0.25) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Passenger Total') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Ribbon Plot of the Air Passenger Counts (12 month window)')

Exercises
- Construct a 3 month moving average for the passenger data and compare it to the 6 and 12 month values.
- Calculate the 6 month and 12 month rolling average values for the Maine unemployment data.
- Construct the ribbon plot for the Maine unemployment data.
- Construct moving average data for the CBE dataset. This process may be made easier by reshaping the data.
Differences
Another common transformation of the data is to take the ‘first differences’ of the values, i.e. we convert the time series of values into one of differences. We discuss the reasons for this later on – for now we focus on the mechanics of creating first differences.
ap_firstdiff_tbl <- airpassengers_tbl %>%
mutate(diff = value - lag(value, n = 1))
ap_firstdiff_tbl %>% glimpse()
## Observations: 144
## Variables: 3
## $ month <S3: yearmon> Jan 1949, Feb 1949, Mar 1949, Apr 1949, May 1949, Jun...
## $ value <dbl> 112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 1...
## $ diff <dbl> NA, 6, 14, -3, -8, 14, 13, 0, -12, -17, -15, 14, -3, 11, 15, ...
ap_firstdiff_tbl %>% print()
## # A tibble: 144 x 3
## month value diff
## <S3: yearmon> <dbl> <dbl>
## 1 Jan 1949 112 NA
## 2 Feb 1949 118 6
## 3 Mar 1949 132 14
## 4 Apr 1949 129 -3
## 5 May 1949 121 -8
## 6 Jun 1949 135 14
## 7 Jul 1949 148 13
## 8 Aug 1949 148 0
## 9 Sep 1949 136 -12
## 10 Oct 1949 119 -17
## # ... with 134 more rows
Having calculated the differences, we now produce a lineplot of those values.
plot_tbl <- ap_firstdiff_tbl %>%
rename(count = value) %>%
gather('series', 'value', -month)
ggplot(plot_tbl) +
geom_line(aes(x = month, y = value, colour = series)) +
expand_limits(y = 0) +
xlab('Month') +
ylab('Value') +
scale_x_yearmon() +
scale_y_continuous(labels = comma) +
ggtitle('Plot of the Air Passenger Counts and First Differences')

As we see with this plot, the first differences of the passenger data does not contain a trend.
Exercises
- Calculate the first differences for the Maine unemployment data.
- Create a lineplot of this data to check for its value.
- Calculate the first differences for the CBE data.
- Create lineplots for the CBE differences.
- Using the
lag() function with the Air Passenger data, calculate the percentage changes data instead of the arithmetic changes.
- Construct the lineplot for the percentage change values.